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Abstract 



f^ This paper considers the capacity of sub-sampled analog channels when the sampler is designed to operate 

independent of instantaneous channel realizations. A compound multiband Gaussian channel with unknown subband 

O occupancy is considered, with perfect channel state information available at both the receiver and the transmitter. We 

■^C restrict our attention to a general class of periodic sub-Nyquist samplers, which subsumes as special cases sampling 

t*^ with periodic modulation and filter banks. 

" ' We evaluate the loss due to channel-independent (universal) sub-Nyquist design through a sampled capacity loss 

I I metric, that is, the gap between the undersampled channel capacity and the Nyquist-rate capacity. We investigate 

| - i sampling methods that minimize the worst-case (minimax) capacity loss over all channel states. A fundamental 

C/3 lower bound on the minimax capacity loss is first developed, which depends only on the band sparsity ratio and the 

I I undersampling factor, modulo a residual term that vanishes at high signal-to-noise ratio. We then quantify the capacity 

.. loss under Landau-rate sampling with periodic modulation and low-pass filters, when the Fourier coefficients of the 

^ modulation waveforms are randomly generated and independent (resp. i.i.d. Gaussian-distributed), termed independent 

random sampling (resp. Gaussian sampling). Our results indicate that with exponentially high probability, independent 

[ — random sampling and Gaussian sampling achieve minimax sampled capacity loss in the Landau-rate and super-Landau- 

*• . rate regime, respectively. While identifying a deterministic minimax sampling scheme is in general intractable, our 

^i results highlight the power of randomized sampling methods, which are optimal in a universal design sense. Similar 

fT^ results and conclusions for a discrete-time sparse vector channel can be delivered as an immediate consequence 

. . of our analysis: independent random sensing matrices and i.i.d. Gaussian matrices are respectively minimax in the 

Landau-rate and super-Landau-rate regime. 
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I. Introduction 

The maximum rate of information that can be conveyed through an analog communication channel largely depends 
on the sampling technique and rate employed at the receiver end. In wideband communication systems, hardware 
and cost limitations often preclude sampling at or above the Nyquist rate, which presents a major bottleneck in 
transferring wideband and energy-efficient receiver design paradigms from theory to practice. Understanding the 
effects upon capacity of sub-Nyquist sampling is thus crucial in circumventing this bottleneck. 

In practice, receiver hardware and, in particular, sampling mechanisms are typically static and hence designed 
based on a family of possible channel realizations. During operation, the actual channel realization will vary over this 
class of channels. Since the sampler is typically integrated into the hardware and difficult to change during system 
operation, it needs to be designed independent of instantaneous channel state information (CSI). This has no effect 
if the sampling rate employed is commensurate with the maximum bandwidth (or the Nyquist rate) of the channel 
family. However, at the sub-Nyquist sampling rate regime, the sampler design significantly impacts the information 
rate achievable over different channel realizations. As was shown in |1|, the capacity-maximizing sub-Nyquist 
sampling mechanism for a given linear time-invariant (LTI) channel depends on specific channel realizations. In 
time-varying channels, sampled capacity loss relative to the Nyquist-rate capacity is necessarily incurred due to 
channel-independent (universal) sub-Nyquist sampler design. Moreover, it turns out that the capacity-optimizing 
sampler for a given channel structure might result in very low data rate for other channel reaUzations. 

In this paper, our goal is to explore universal design of a sub-Nyquist sampling system that is robust against 
the uncertainty and variation of instantaneous channel realizations, based on sampled capacity loss as a metric. 
In particular, we investigate the fundamental lower limit of sampled capacity loss in some overall sense (as will 
be detailed as minimax capacity loss in Section |II-C| l, and design a sub-Nyquist sampling system for which the 
capacity loss can be uniformly controlled and optimized over all possible channel realizations. 

A. Related Work 

In various scenarios, sampling at or above the Nyquist rate is not necessary for preserving signal information 
if certain signal structures are appropriately exploited Q, fSl. Take multiband signals for example, that reside 
within several subbands over a wide spectrum. If the spectral support is known, then the sampling rate necessary 
for perfect signal reconstruction is the spectral occupancy, termed the Landau rate Q. Such signals admit perfect 
recovery when sampled at rates approaching the Landau rate, assuming appropriately chosen sampling sets (e.g. 
p), 16)). Inspired by recent "compressive sensing" |J7l-||9) ideas, spectrum-blind sub-Nyquist samplers have also 
been developed for multiband signals pO) , pulse streams pT[ , | [T2| , etc. These sampling-theoretic works, however, 
were not based on capacity as a metric in the sampler design. 

On the other hand, the Shannon-Nyquist sampling theorem has frequently been used to investigate analog 
waveform channels (e.g. 1 13 |-p7)). One key paradigm to determine or bound the channel capacity is converting the 



continuous-time channel into a set of parallel discrete-time channels, under the premise that sampling, when it is 
performed at or above the Nyquist rate, preserves information. In addition, the effects upon capacity of oversampling 



have been investigated in the presence of quantization p8| , p9| . However, none of these works considered the 
effect of reduced-rate sampling upon capacity. Another recent Une of work |20| investigated the tradeoff between 
sparse coding and subsampling in AWGN channels, but did not consider capacity-achieving input distributions. 

Our recent work fl), pTl established a new framework for investigating the capacity of linear time invariant (LTI) 
Gaussian channels under a broad class of sub-Nyquist sampling strategies, including filter-bank and modulation- 
bank sampling pO| , p2) and, more generally, time-preserving sampUng. We showed that periodic sampling or, more 
simply, sampling with a filter bank, is sufficient to approach maximum capacity among all sampling structures under 
a given sampling rate constraint, assuming that perfect CSI is available at both the receiver and the transmitter 

Practical communication systems often involve time-varying channels, e.g. wireless slow fading channels p6) , 
p3] . Many of these channels can be modeled as a channel with state (see a detailed survey in |J23] Chapter 7]), 
where the channel variation is captured by a state that may be fixed over a long transmission block, or, more simply, 
a compound channel p4) whereby the channel realization lies within a collection of possible channels |25|. One 
class of compound channel models concerns multiband Gaussian channels, whereby the instantaneous frequency 
support active for transmission resides within several continuous intervals, spread over a wide spectrum. This model 
naturally arises in several wideband communication systems, including time division multiple access systems and 



cognitive radio networks, as will be discussed in Section II-A However, to the best of our knowledge, no prior work 



has investigated, from a capacity perspective, a universal (channel-independent) sub-Nyquist sampling paradigm that 
is robust to channel variations in the above channel models. 

Finally, we note that the design of optimal sampling / sensing matrices have recently been investigated in discrete- 
time settings from a information theoretical perspective. In particular, Donoho et. al. [26| assert that: random and 
band-diagonal sampling systems admit perfect signal recovery from an information theoretically minimal number 
of samples. However, the optimality was not defined based on channel capacity as a metric, but instead based on 
a fundamental rate-distortion limit |J27|. 

B. Contribution 

In this paper, we consider a compound multiband channel, whereby the channel bandwidth W is divided into n 
continuous subbands and, at each timeframe, only k subbands are active for transmission. We consider the class 
of periodic sampling systems (i.e. a system that consists of a periodic preprocessor and a recurrent sampling set. 



detailed in Section II-B i with period n/W and sampling rate /s = mW/n for some integer m {m < n). Under 
this model, we define 13 :— - as the band sparsity ratio, and a := — as the undersampling factor. The sampling 
mechanism is termed a Landau-rate sampling (resp. super-Landau-rate sampling) system if /s is equal to (resp. 
greater than) the spectral size of the instantaneous channel support. Our contributions are as follows. 

1) We derive, in Theoreml4] a fundamental lower bound on the largest sampled capacity loss (defined in Section 
[ll| incurred by any channel-independent sampler, under both Landau-rate and super-Landau-rate sampling. 
This lower bound depends only on the band sparsity ratio and the undersampling factor, modulo a small 
residual term that vanishes when SNR and n increase. The bound is derived by observing that at each 



frequency within [0, W/n], the exponential sum of the capacity loss over all states s is independent of the 
sampling system, except for a relatively small residual term that vanishes with SNR. 

2) Theorem |5] characterizes the sampled capacity loss under a class of periodic sampling with periodic modulation 
(of period ^) and low -pass filters with passband [O, ^], when the Fourier coefficients of the modulation 
waveforms are randomly generated and independent (termed independent random sampling). We demonstrate 
that with exponentially high probability, the sampled capacity loss matches the fundamental lower bound of 
Theorem |4] uniformly across all channel realizations. This implies that independent sampling achieves the 
minimum worst-case (or minimax) capacity loss among all periodic sampling methods with period ^. To be 
more concrete, an independent random sampling system achieves minimax sampled capacity loss as long as 
the Fourier coefficients of the modulation waveforms are independently sub-Gaussian | |28| distributed with 
matching moments up to the second order This universality phenomenon occurs due to sharp concentration 
of spectral measures of large random matrices |29|. 

3) For a large portion of the super-Landau-rate regime, we quantify the sampled capacity loss under independent 
random sampling when the Fourier coefficients of the modulation waveforms are i.i.d. Gaussian-distributed 
(termed Gaussian random sampling), as stated in Theorem [6] With exponentially high probability, Gaussian 
random sampling achieves minimax capacity loss among all periodic sampling with period ^. 

4) Similar results for a discrete-time sparse vector channel with unknown channel support can be delivered as 
an immediate consequence of our analysis framework. When the number of measurements is equal to the 
channel support size, independent random sensing matrices are minimax in terms of channel-blind sampler 
design. When the sample size exceeds the channel support size, Gaussian sensing matrices achieve minimax 
sampled capacity loss. 

C. Organization 

The remainder of this paper is organized as follows. In Section III] we introduce our system model of compound 
multiband channels. A metric called sampled capacity loss, and a minimax sampler are defined with respect to 



sampled channel capacity. We then determine in Section III the minimax capacity loss that is achievable within 



the class of periodic sampling systems. Specifically, we develop a lower bound on the minimax capacity loss in 



Section |III-A| Its achievability under Landau-rate and super-Landau-rate sampling are treated in Section |III-B| and 
Section |III-C| respectively. Section |IV-A| summarizes the key observation and implications from our results. We 
present in Section IV] extensions to discrete-time sparse vector channels, and discuss connections with compressed 



sensing literature. Section VI closes the paper with a short summary of our findings and potential future directions. 



D. Notation 

We define the following two functions: log'^(a;) := log (max (e, x)), and det*^ X :— Yli max (Ai {X) , e). Denote 
by ■H(/3) := — /? log /3 — (1 — /3) log(l — /3) the binary entropy function, and H {{xi, ■ ■ ■ , Xn}) := — J27=i ^i l^S ^i 
the more general entropy function. The standard notation f{n) = O {g{n)) means there exists a constant c (not 



Table I 
Summary of Notation and Parameters 

binary entropy function, i.e. 'H{x) = — xlogx — (1 — a;)log(l — x) 

impulse response, and frequency response of the LTI analog channel 

power spectral density of the noise ■q{t) 

aggregate sampling rate, and the corresponding sampling interval (Tj = l//s) 

channel bandwidth, size of instantaneous channel support 

number of subbands, number of sampling branches, number of subbands being 

simultaneously active 

undersampling factor, sparsity ratio 

sampling matrix, whitened sampling matrix 

capacity loss associated with a sampling matrix Q given state s 

log^ (x) := log (max (e, x)), def^ (X) := Yli max (Ai (X) , e) 

ith row of A, ith column of A 

cardinality of a set A 

[n];={l,2,--- ,n} 

set of all fc-element subsets of A, set of all fc-element subsets of [n] 

p X p-dimensional central Wishart distribution with n degrees of freedom and covariance 

matrix S 

set of integers, set of real numbers 



necessarily positive) such that |/(n)| < cg{n). For a matrix A, we use Ai^ and A^,i to denote the ith row and ith 
column of A, respectively. We let [n] denote the set {1,2, • • • ,n\. For any set A C [n], we denote by (■^) the 
set of all fc-combinations of A. In particular, we write ('^'') for the set of all fc-element subsets of {1,2, • • • ,ri}. 
We also use card {A) to denote the cardinality of a set A. Let W be a p x p random matrix that can be expressed 
as W = Y^l^^ZiZ^ , where Zi ^ A/" (0, S) are jointly independent vectors. Then W is said to have a central 
Wishart distribution with n degrees of freedom and covariance matrix E, denoted as W ^ Wp{n, S). Our notation 
is summarized in Table U 

II. Problem Formulation and Preliminaries 
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A. Compound Multiband Channel 



We consider a compound multiband Gaussian channel. The channel has a total bandwidth W, and is divided 
into n continuous subbands each of bandwidth W/n. A state s e ('^'') is generated, which dictates the channel 
support and realizatiorT] Specifically, given a state s, the channel is an LTI filter with impulse response hs{t) 

/'OO 

and frequency response Hs{f) = / /is(t)cxp(— j27r/t)dt. It is assumed throughout that there exists a general 

J —CO 



'Note that in practice, n is typically a large number. For instance, the number of subcarriers ranges from 128 to 2048 in LTE 
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function H{f, s) such that for every / and s, Hs{f) can be expressed as 

{1, if / lies within subbands at indices from s, 
0, else. 

A transmit signal x{t) with a power constraint P is passed through this multiband channel, which yields a channel 
output 

rs{t) = K{t)*x{t) + Tj{t), (1) 

where ri{t) is stationary zero-mean Gaussian noise with power spectral density Sj^{f). We assume that perfect CSI 
is available at both the transmitter and the receiver 

The above model subsumes as special cases the following communication scenarios. 

• Time Division Multiple Access Model. In this setting the channel is shared by a set of different users. At 
each timeframe, one of the users is selected for transmission. The receiver (e.g. the base station) allocates a 
subset of subbands to the designated sender over that timeframe. 

• White-Space Cognitive Radio Network. In a white-space cognitive radio network, cognitive users exploit 
spectrum holes unoccupied by primary users and utilize them for communications. Since the locations of the 
spectrum holes change over time, the spectral subbands available to cognitive users is varying over time. 

B. Sampled Channel Capacity 

We aim to design a sampler that works at rates below the Nyquist rate (i.e. the channel bandwidth W). In 
particular, we consider the class of periodic sampling systems, which subsume the most widely used sampling 
mechanisms in practice. 

1) Periodic Sampling: The class of periodic sampling systems is defined in fP, Section IV], which we restate as 
follows. 

Definition 1 (Periodic Sampling). Consider a sampling system consisting of a preprocessor with an impulse 
response q{t, r) followed by a sampling set A = {tk \ k E Z}. A linear sampling system is said to be periodic with 
period Tg and sampling rate fs {fsTq G Z) if for every t,T eM. and every k E Z, we have 

q{t, t) = q{t + Tq,T + Tg); tk+f.T^ = tk + Tq. (2) 

Consider a periodic sampling system V with period Tq = n/W and sampling rate /s :— mW/n for some integer 
m. A special case consists of sampling with a combination of filter banks and periodic modulation with period 
n/W, as illustrated in Fig.fTJa). Specifically, the sampling system comprises ra branches, where at each branch, the 
channel output is passed through a pre-modulation filter, modulated by a periodic waveform of period Tq, and then 
filtered with a post-modulation filter followed by uniform sampling at rate f~i/m. The channel capacity in a sub- 
sampled LTI Gaussian channel has been derived in fTl Theorem 5]. As will be shown later, in the high SNR regime, 
employing water-filling power allocation harvests only marginal capacity gain relative to equal power allocation. For 
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Figure 1. (a) Sampling with modulation and filter banks. The channel output r{t) is passed through m branches, each consisting of a pre- 
modulation filter, a periodic modulator and a post-modulation filter followed by a uniform sampler with sampling rate fs/m. (b) Sampling with 
a modulation bank and low-pass filters. The channel output r(t) is passed through m branches, each consisting of a modulator with modulation 
waveform qi{t) and a low-pass filter of pass band [0, fs/m] followed by a uniform sampler with sampling rate fs/m. 



this reason and for mathematical convenience, we only restate below the sampled channel capacity under uniform 
power allocation, which suffices to demonstrate the fundamental minimax limit as well as the convergence rate 
with SNR. Specifically, if we denote by s^ and /, the ith smallest element in s and the lowest frequency of the 
ith subband, respectively, and define Hs{f) as a /c x fc diagonal matrix obeying 

.„ ..^^ _ \Hsifs,+f)\ 

^ '^^^^''~ VSnifs,+fy 
then the sampled channel capacity, when specialized to our setting, is given as follows. 

Theorem 1 (Sampled Capacity with Equal Power Allocation yj). Consider a channel with total bandwidth W 
and instantaneous band sparsity ratio /3 :— —. Assume perfect CSI at both the transmitter and the receiver, and 
equal power allocation employed over active subbands. If a periodic sampler V with period n/W and sampling 
rate fs — —W is employed, then the sampled channel capacity at a given state s is given by 

w 

1 r~ . . , p 



Cf ^ - I logdet ( /„, , 



g™(/)fl-j(/)g™*(/) d/, 



(3) 



1/9 

where <5"(/) := {Q{f)Q*{f)) Qif)- Here, Q{f) is an m x n matrix that only depends on V, and Qsif) 

denotes the submatrix consisting of the columns of Q{f) at indices of s. 

In general, Q{f) is a function that varies with /. Unless otherwise specified, we call Q(-) the sampling coefficient 
function and Q™(-) the whitened sampling coefficient function with respect to the sampling system V. Note that 

Q"(/)Q"*(/) = /- 

2) Flat Sampling Coefficient Function: A special class of periodic sampling concerns the ones whose Q{-) are 
flat over [0, fs/m], in which case we can use an m x ?i matrix Q to represent the sampling coefficient function, 
termed a sampling coefficient matrix. This class of sampling systems can be realized through the m-branch sampling 
system illustrated in Fig. fTlb). In the ith branch, the channel output is modulated by a periodic waveform qi{t) of 



period n/W, passed through a low-pass fiher with pass band [0, fs/m], and then uniformly sampled at rate f^/m, 
where the Fourier transform of qi{t) obeys F {qi{t)) — X^ILi Qi i^ (/ ^ ^"t)- ^^ '-^^^ paper, a sampling system 
within this class is said to be (independent) random sampling if the entries of Q are randomly generated (and are 
independent). In addition, a sampling system is termed Gaussian sampling if the entries of Q are i.i.d. Gaussian 
distributed. 

It turns out that this simple class of sampling structures is sufficient to achieve overall robustness in terms of 
sampled capacity loss, provided that the entries of Q are sub-Gaussian with zero mean and unit variance, as will 
be detailed in Section HiH 

C. Universal Sampling 

As was shown in |T|, the optimal sampling mechanism for a given LTI channel with perfect CSI extracts out 
a frequency set with the highest SNR and hence suppresses aliasing. Such alias-suppressing sampler may achieve 
very low capacity for some channel realizations. In this paper, we desire a sampler that operates independent of the 
instantaneous CSI, and our objective is to design a single Unear sampling system that achieves to within a minimal 
gap of the Nyquist-rate capacity across all possible channel realizations. The availability of CSI to the transmitter, 
the receiver and the sampler is illustrated in Fig. [2] 
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Figure 2. At each timeframe, a state is generated from a finite set cS, wliich dictates the channel realization Hs(f)- Both the transmitter and 
the receiver have perfect CSI, while the sampler operates independently of s. 



1) Sampled Capacity Loss: Universal sub-Nyquist samplers suffer from information rate loss relative to Nyquist- 
rate capacity. In this subsection, we make formal definitions of this metric. 

For any state s, when equal power allocation is performed over active subbands, the Nyquist-rate capacity can 
be written as 

which is a special case of ([3]l. In contrast, if power control at the transmitter side is allowed, then the Nyquist-rate 



\ogdet[Ik + ^,Hl{f)\df, 



(4) 



capacity is given by 



i— 1 



where v is determined by the equation 

We can then make formal definitions of sampled capacity loss as follows. 

Definition 2 (Sampled Capacity Loss). For any sampling coefficient function Q() and any given state s, we 
define the sampled capacity loss without power control as 

tQ — /^^»i _ (-<Q 
and define the sampled capacity loss with optimal power control as 

These metrics quantify the capacity gaps relative to Nyquist-rate capacity due to universal (channel-independent) 
sub-Nyquist sampling design. When sampling is performed at or above the Landau rate (which is equal to ^^^ in 
our case) but below the Nyquist rate, these gaps capture the rate loss due to channel-independent sampling relative 
to channel-optimized design, both with or without power control. 

For notational convenience, for an m x n matrix M, we denote by L^ and Ls '°'" the capacity loss with respect 
to a sampling coefficient function Q{f) = M, which is flat across [0, W/n]. 

2) Minimax sampler: Frequently used in the theory of statistics (e.g. |32|), minimaxity is a metric that seeks to 
minimize the loss function in some overall sense, defined as follows. 

Definition 3 (Minimax Sampler). A sampling system associated with a sampling coefficient function Q™, which 
minimizes the worst-case capacity loss, that is, which satisfies 



max 



LT ^ inf max L? 



S I 



is called a minimax sampler with respect to the state alphabet {^)- 

The minimax criteria is of interest for designing a sampler robust to all possible channel states, that is, achieving 
to within a minimal gap relative to maximum capacity for all channel realizations. It aims to control the rate 
loss across all states in a uniform manner, as illustrated in Fig. [3] Note that the minimax sampler is in general 
different from the one that maximizes the lowest capacity among all states (worst-case capacity). While the latter 
guarantees an optimal worst-case capacity that can be achieved regardless of which channel is realized, it may 
result in significant capacity loss in many states with large Nyquist-rate capacity, as illustrated in Fig. [3] In contrast. 
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Figure 3. Minimax sampler v.s. the sampler that maximizes worst-case capacity, when sampling is channel-independent and performed below the 
Nyquist rate. The blue solid line represents the Nyquist-rate (analog) capacity, the black dotted line represents the capacity achieved by minimax 
sampler, the orange dashed illustrates the Nyquist-rate capacity minus the minimax capacity loss, while the purple dashed line corresponds to 
maximum worst-case capacity. 



a desired minimax sampler controls the capacity loss for every single state s, and allows for robustness over all 
channel states with universal channel-independent sampling. It turns out that in the compound multiband channels, 

Vs, lT = max L?°' 
except for some vanishingly small residual terms, as will be shown in the next section. 

III. Minimax Sampled Capacity Loss 

The minimax sampled capacity loss problem can be cast as minimizing maxsg^ L^ over all sampling coefficient 
functions Q{f)- In general, the problem is non-convex in Q{f), and hence it is difficult to identify the optimal 
sampling systems. Nevertheless, the minimax capacity loss can be quantified reasonably well at moderate-to-high 
SNR. It turns out that under both Landau-rate sampling and a large class of super-Landau-rate sampling, the minimax 
capacity loss can be approached arbitrarily well by random sampling. 

Define the undersampling factor a :— m/n, and recall that the band sparsity ratio is /3 := k/n. Our main results 
are summarized in the following theorem. 

Theorem 2. Consider any sampling coefficient function Q{-) with an undersampling factor a, and let the sparsity 



\MIa^ and SNR„ 
(i) (Landau-rate sampling) If a — (i (or k = m), then 



ratio be /3. Define SNE^in := JW '™^o<f<w,s£(^''A ' s^{f) """ "-'^"max - 



p , \H{f,s)\^ 

;9Vy ^^Po</<H',se(H) 5„(/) 



Q ,g(H) 2 \ ^' ^ \ n 



(7) 



where 



Vsnr;; 



< At < 



SNRniin 



(ii) (Super-Landau- rate sampling) Suppose that there is a small constant 5 > Q such that a — (5 > 6 and 
I — a — /3 > 5. Then 

inf max L? = f (h (/3) - aH (^-) + O f'-^) + As J , (8) 

where 

/3 



< ASL< 



^■111111 



ySNR~ - -- - SNR„ 

Remark 1. Note that %{■) denotes the entropy function. Its appearance is due to the fact that it is an asymptotically 
tight estimate of the rate function of binomial coefficients. 

Theorem [2] provides a tight characterization of the minimax sampled capacity loss relative to the Nyquist-rate 
capacity, under both Landau-rate sampling and super-Landau-rate sampling. Note that the Landau-rate sampling 
regime in (i) is not a special case of the super-Landau-rate regime considered in (ii). For instance, if /3 > 1/2, then 
a + /3 > 1, which falls within a regime not accounted for by Theorem l2lii). 

The expressions ( 7 1 and ( 8 i contain residual terms no larger than O ( -^^ ) + .^ per unit bandwidth, which 
vanishes for large n and high SNR. These fundamental minimax limits do not scale with the SNR and n modulo 
a vanishing residual term. Since the Nyquist rate capacity scales as 0(W^logSNR), our results indicate that the 
ratio of the minimax capacity loss to the Nyquist-rate capacity vanishes at a rate 8 (1/ log SNR). 

Note that even if we allow power control at the transmitter side, the results are still valid at high SNR. This is 
summarized in the following theorem. 

Theorem 3. Consider the metric L^°p^ with power control. Under all other conditions of TheoremUj the bounds 
(ml and f^ (with L^ replaced by L^'°P'j continue to hold if Al and AgL are respectively replaced by some 
residuals A^'' and Ag^ that satisfy 

_ 2 ^ pt opt ^ 13 + A 

ySNR— - L > SL - sNR,„i„ ' 

where A is a constant defined as 

max ,, ,. (^ |g(A«)l' Hf Sim ,, ,, \H(f,s)\^ 



Theorem 3 demonstrates that if the average-to-minimum ratio gj^^^ is bounded by a constant (where SNR := 
Taay. /\„]\ -J^ L 5(7) d/), then the minimax sampled capacity gap with power control remains almost the 
same as that with power control within a gap at most O I -^^ ) + O ( ,^ J per unit bandwidth. Note that the 
constant A given in Theorem bl is fairly conservative, and can be refined with finer tuning or algebraic techniques. 

Theorem [3] can be delivered as an immediate consequence of Theorem l2] if we can quantify the gap between 
Cs"* and C°P*. In fact, the capacity benefits of using power control at high SNR regime is no larger than O (gj^) 
per unit bandwidth. See Appendix \M for details. For this reason, our analysis is mainly devoted to L^, which 
corresponds to the capacity loss relative to Nyquist-rate capacity with uniform power allocation. 



The proof of Theorem [2] involves the verification of two parts: a converse part that provides a fundamental lower 
bound on the minimax sampled capacity loss, and an achievabiUty part that provides a samphng scheme to approach 
this bound. As we show, the class of sampling systems with periodic modulation followed by low-pass filters, as 
illustrated in Fig. flTb), is sufficient to approach the minimax sampled capacity loss. 

Throughout the remainder of the paper, we suppose that the noise is of unit power spectral density S,i{f) = 1 
unless otherwise specified. Note that this incurs no loss of generality since we can always include a noise-whitening 
LTI filter at the first stage of the sampling system. 

A. The Converse 

We need to show that the minimax sampled capacity loss under any channel-independent sampler cannot be 
lower than (J7]l and ([8]l. This is given by the following theorem, which takes into account the entire regime including 
the situation where a + (3 > I. 

Theorem 4. Consider any Riemann-integrable sampling coefficient function Q(-) with an undersampling factor 
a :— m/n. Suppose the sparsity ratio /3 :— k/n satisfies /3 < a < 1. The minimax capacity loss can be lower 
bounded by 

For a given /3, the bound is decreasing in a. While the active channel bandwidth is smaller than the total 
bandwidth, the noise (even though the SNR is large) is scattered over the entire bandwidth. Thus, none of the 
universal sub-Nyquist sampling strategies are information preserving, and increasing the sampling rate can always 
harvest capacity gain. 

B. AchievabiUty with Landau-rate Sampling (a = (3) 

Consider the achievabiUty part when the sampling rate equals the active frequency bandwidth (/3 = a). In general, 
it is very difficult to find a deterministic solution to approach the lower bound (j9]l. A special instance of sampling 
methods that we can analyze concerns the case in which the samphng coefficient functions are flat over [0, W/n] 
and whose coefficients are generated in a random fashion. It turns out that as n grows large, the capacity loss 
achievable by random sampling approaches the lower bound (|9]) uniformly across all realizations. The results are 
stated in Theorem l5] after introducing a class of sub-Gaussian measure below. 

Definition 4. A measure z^ on M satisfies the logarithmic Sobolev inequality with constant cls if, for any 
differentiable function g, 



j g^ log j^dv <2c^^ j \g't dv. 



Remark 2. A probability measure obeying the logarithmic Sobolev inequality possesses sub-Gaussian tails, and a 
large class of sub-Gaussian measures satisfies this inequality for some constant. See ^2^ for examples. In particular, 
the standard Gaussian measure satisfies this inequality with constant cls = 1 (^-g- /p5Vj. 



Theorem 5. Let M — {Cij)i<i<k i< <n ^^ '^ random matrix in which Qj 's are jointly independent with zero mean 
and unit variance. In addition, suppose that Qj satisfies one of the following conditions: 

(a) Cij is almost surely bounded by a constant D; 

(b) The probability measure of Qj satisfies the logarithmic Sobolev inequality with a bounded constant cls. 
Then there exist some constants c, C > such that 

n^xLr<f fH(^) + ^^ + -^) (10) 

se(H) 2 V n SNRmin/ 

with probability exceeding 1 — Cexp (en). 

Theorem l5] demonstrates that independent random sampHng achieves minimax sampled capacity loss among 
all periodic sampling methods with period n/W . In fact, our analysis demonstrates that the sampled capacity 
loss approaches the minimax limit uniformly over all states s. Another interesting observation is the universality 
phenomenon, i.e. a broad class of sub-Gaussian ensembles, as long as the entries are jointly independent with 
matching moments, suffices to generate minimax samplers. 



C. Achievability with Super-Landau-Rate Sampling (a > /3, a + /3 < 1) 

So far we have considered the case in which the sampling rate is equal to the spectral support. While the 
active bandwidth for transmission is smaller than the total bandwidth, the noise (even though the SNR is large) is 
scattered over the entire bandwidth. This indicates that none of the sub-Nyquist sampling strategies preserves all 
information contents conveyed through the noisy channel, unless they know the channel support. One may thus 
hope that increasing the sampling rate would improve the achievable information rate. The achievability result for 
super-Landau-rate sampling is stated in the following theorem. 

Theorem 6. Let M = {Cij)i<i<rni<'<n ^^ '^ random Gaussian matrix in which Cij's ore i.i.d. drawn from 
M (0, 1). Suppose that there exists a small constant £ > such that 1 — a — /3 > s and a — /3 > e. Then there exist 
some constants c, C > such that 



max L^ < 

«e(H) 



w 



H(/3)-aH(^Uo^^°s''^^ ■ ^ 



a \ x/Tl SNR 



^min 



with probability exceeding 1 — Cexp (—en). 

Theorem [6] indicates that i.i.d. Gaussian sampling approaches the minimax capacity loss (J8]l with vanishingly 
small gap. As will be shown in our proof, with exponentially high probability, the sampled capacity loss for all 
states are equivalent and coincide with the fundamental minimax hmit. In contrast to Theorem l5l we restrict our 
attention to Gaussian sampling, which suffices for the proof of Theorem |2] 
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D. Equivalent Algebraic Problems 

Our results are established by investigating three equivalent algebraic problems. Recall that -J^HI ^ SNRminJfc. 



Define Q^ :— {QQ*) ^ Qg- Simple manipulations yield 

•W/n / P \ 1 r^/n 



1 rW/n / P \ 1 r^/^ / P \ 

^? = -2j „ ^°S^"* (^™ + -^Q:if)Hl{f)QT{f)j d/ + -J ^ logdct i^I, + ^Hl{f)j d/ 

= - ly^ "logdet (^I, + J^HM)QT{f)Q:{f)Hs{f?j d/ 

- - ^/J " logdet (^^//;^(/) + Qr(/)Q:(/)) d/ + ^A„ (12) 



where A^ denotes some residual term. In particular, Ag can be bounded as 

0<As< ^ . (13) 

- - SNR„,in 

This is an immediate consequence of the following observation: for any k x k positive semidefinite matrix A, 

Oaiogdet(I, + A)-ilogdet(A)4|:iog(l + ^)<^. (14) 

Recall that SNR,„i„ := ^ info</<v^ |iJ(/)|' and SNR,,ax := ^ supo</<v^ |i?(/)|'. Therefore, ^H'^ 

can be bounded as 

1 BW , 1 



SNR„,i„ ^ - P ^ - SNR„„„ 



This bound together with ( 12 1 makes det {elk + Q'^*Q'^) a quantity of interest (for some small e). 

1) The Converse: Note that Q'*' {f) has orthonormal rows. The following theorem investigates the properties of 
det {elk + BgBs) for any mxn matrix B that has orthonormal rows. This, together with the Riemann integrability 
assumption of Q{f), estabhshes Theorem |4] 

Theorem 7. (1) Consider any mxn matrix B {n > m > k) that satisfies BB* = I^, and denote by Bg the 
m X k submatrix of B with columns coming from the index set s. Then for any e > 0, one has 

(2) For any positive integer p, suppose that Bi, • • • , Bp are all m x n matrices such that BiB* — /„. Then, 

^,1^. - E log det {elk + {B,): (S,) J < - log f™) - - log ('') + 2^e (18) 

^^{^f)'^P7~{ "' n \kj n \kj 

<an{--]-n{P) + 2Ve+'^^^^^^±^. (19) 



\aj n 



Note that Q'"{f) has orthonormal rows for any /, and Q'^if) is assumed to be Riemann integrable. For any 
S > 0, we can find a sufficiently large p such that 

r^/" W^ ( (iW\ /iW" 

/ hgdet {elk + QT{f)Q:{f))df < 5+ -J2^ogdet( elk + QT — Q: — 

Since S can be arbitrarily small, applying Theorem It] immediately yields that for any Q(): 



, \ogdet{eIk+QT{f)Q:{f))df <W {an(^] ~H{(3) + 2Ve+^-^^'^^^ 
e(H)./ n \ \a J n 



/W/n f 

logdet {elk + QZ*{f)QZ{f)) d!<w\an[^]~n{p) + 2^e 

This together with ([T2]i, ([T3]l and ([T5]l leads to 



W ( 
inf max L^ > — {% {(i) - oH 



/3\ 2 log(n + l) 



e(H) ' - 2 i ' ' V«/ %/SNR~ n 

which completes the proof of Theorem HI 

2) Achievability {Landau-rate Sampling): When it comes to the achievability part, the major step is to quantify 
det(e/fc + {MM )~^ M gM ^ ) for every s. Interestingly, this quantity can be uniformly bounded due to the 
concentration of spectral measure of random matrices f29l. This is stated in the following theorem, which 
demonstrates the achievability for Landau-rate sampling. 

Theorem 8. Let M — (Cij)i<i<fe i< ■<„ ^^ '^ real-valued random matrix. Under the conditions of Theorem^ one 
has 

-n (/3) - ^^^ < min - logdet (elk + (mM^V' M^mA < -H (/3) + 2^e+^^^^^^^^ 
n sg(H) n \ \ I J n 

with probability exceeding 1 — Cexp {—en) for some constants c, C > 0. 

Putting Theorem [8] and equations ( [T2| , ( [T3] l and ( [T5] l together establishes that 



max L^ < 



W 



H(« + ^ + ^ 



with probability exceeding 1 — Cexp (en). 

3) Achievability (Super-Landau- rate Sampling): Instead of studying a large class of sub-Gaussian random 
ensembles^ the following theorem focuses on i.i.d. Gaussian matrices, which establishes the optimality of Gaussian 
random sampling for the super-Landau regime. 

Theorem 9. Let M = (Cji)i<i<„i i< <„ ^^ '^ real-valued i.i.d. Gaussian matrix. Under the conditions of Theorem 
|6] one has 

-^-{P) + aU { -] + O \^^^] < min - logdet (eJfc + M!' (mM^) ^ M, 
\aj \ Vn J se(H) "V ^ ^ 

\aj n 

-The proof argument for Landau-rate sampling cannot be readily carried over to super-Landau regime since Ms is now a tall matrix, and 
hence we cannot separate Mg and MM* easily. 
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with probability at least 1 — Cexp (—en) for some constants c, C > 0. 



Combining Theorem [9] and equations (12i, (13 1 and (15 1 implies that 



M ^W 



' aj \ -y/n J SJNR 



mill 



with probability exceeding 1 — Cexp {—en). 

The proofs of Theorems IT] [8] and l9] which are provided in Appendices IB] IC] and IDJ respectively, rely heavily on 
non-asymptotic (random) matrix theory. 

IV. Discussion 

A. Implications of Main Results 

Under both Landau-rate and super-Landau-rate sampling, the minimax capacity loss depends almost entirely on 
(3 and a. In this subsection, we summarize several key insights from the main theorems. 

1) The Converse: Our analysis demonstrates that at high SNR, the loss L^ depends almost solely on the quantity 

d(Q(/),s,e):=det(e/fc + Qr(/)Q:(/)) 

for small e > 0, which is approximately the exponential of capacity loss at a given pair {s,f). In fact, the key 
observation underlying the proof of Theorem W\ is that for any /, the sum 

E diQif),s,e) 
seCl') 
is a constant independent of the sampling coefficient function Q. In other words, at any given /, the exponential 

sum of capacity loss over all states s is invariable regardless of what samplers we employ. 

This invariant quantity is critical in identifying the minimax sampling method. In fact, it motivates us to seek 
a sampling method that achieves equivalent performance over all states s. Large random matrices exhibit sharp 
concentration of spectral measure, and hence become a natural candidate to attain minimaxity. 

2) Landau-rate Sampling: When sampling is performed at the Landau rate, the minimax capacity loss per unit 
Hertz is almost solely determined by the entropy function 'H{ji). Specifically, when n and k are sufficiently large, 
the minimax limit depends only on the sparsity ratio (3 — - rather than {n,k). Some implications from Theorem 
|4]and Theorem |5] are as follows. 

1) The capacity loss per unit Hertz is illustrated in Fig. Wa). The capacity loss vanishes when /3 — > 1, since 
Nyquist-rate sampling is information preserving. The capacity loss divided by (3 is plotted in Fig. Wb), 
which provides a normalized view of the capacity loss. It can be seen that the normalized loss decreases 
monotonically with /?, indicating that the loss is more severe in sparse channels. Note that this is different 
from an LTI channel whereby sampling at the Landau rate is sufficient to preserve all information. When the 
channel state is uncertain, increasing the sampling rate above the Landau rate (but below the Nyquist rate) 
effectively increases the SNR, and hence allows more information to be harvested from the noisy sampled 
output. 




(a) 




(b) 



Figure 4. Plot (a) illustrates "H (/3) /2 v.s. the sparsity ratio /3, which characterizes the fundamental minimax capacity loss pert Hertz within 



a gap at most O ( ^2£I1 ) + 
per Hertz. 



y'SNR,-„i„ 



Plot (b) illustrates H {0) /2/3 v.s. /3, which coiTesponds approximately to the normalized capacity loss 



2) The capacity loss incurred by independent random sampling meets the fundamental minimax limit for Landau- 
rate sampling uniformly across all states s, which reveals that with exponentially high probability, random 
sampling is optimal in terms of universal sampling design. The capacity achievable by random sampling 
exhibits very sharp concentration around the minimax limit uniformly across all states s e (^)- 

3) 

4) A universality phenomenon that arises in large random matrices (e.g. p4| ) leads to the fact that the minimaxity 
of random sampling matrices does not depend on the particular distribution of the coefficients. For a large 
class of sub-Gaussian measure, as long as all entries are jointly independent with matching moments up to 
the second order, the sampling mechanism it generates is minimax with exponentially high probability. 

3) Super-Landau-Rate Sampling: The random sampling analyzed in Theorem l6] only involves Gaussian random 
sampling, and we have not shown universality results. Some implications under super-Landau-rate sampling are as 
follows. 

1) Similar to the case with Landau-rate sampling, Gaussian sampling achieves minimax capacity loss uniformly 
across all states s within a large super-Landau-rate regime. The capacity gap is illustrated in Fig. |5] It can be 
observed from the plot that increasing the a//3 ratio improves the capacity gap, shrinks the locus and shifts 
it leftwards. 

2) Theorem [6] only concerns i.i.d. Gaussian random sampling instead of more general independent random 
sampling. While we conjecture that the universality phenomenon continues to hold for other jointly 
independent random ensembles with sub-Gaussian tails and appropriate matching moments, the mathematical 
analysis turns out to be more tricky than in the Landau-rate samphng case. 

3) The capacity gain by sampling above the Landau rate depends on the undersampling factor a as well. 
Specifically, the capacity benefit per unit bandwidth due to super-Landau sampling is captured by the term 
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Figure 5. The function i'H (/3) ~ %H {P/ct) v.s. tlie sparsity ratio /9 and the undersampUng factor a. Here, i'H (/9) — |^'H (/3/o) characterizes 
the fundamental minimax capacity loss pert Hertz within a gap at most O 



log n 



■ySNR„i„ ■ 



^oH ( £ ) • When a — >■ 1, the capacity loss per Hertz reduces to 



2 2 V a 



0, 



meaning that there is effectively no capacity loss under Nyquist-rate sampling. This agrees with the fact that 
Nyquist-rate sampling is information preserving. 

V. Connections with Discrete-Time Sparse Channels 

A. Minimax Sampler in Discrete-time Sparse Channels 

Our results have straightforward extensions to discrete-time sparse vector channels. Specifically, consider a 
collection of n parallel channels. The channel input x e M" is passed though the channel and contaminated 
by Gaussian noise n ~ A/^ (0, /„), yielding a channel output 

r = Hx + n, 

where H is some diagonal channel matrix. In particular, at each asymptotically long timeframe, a state s £ (y) 
is generated, which dictates the set of channels available for transmission, i.e. the transmitter can only send x at 
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indices in s. One can then obtain m measurements of the channel output through a sensing matrix Q E M™^", i.e. 
the measurements y E M'" can be expressed as 

y ^Qr = Q {Hx + n) . 

The goal is then to identify a sensing matrix Q that minimizes the worst-case capacity loss over all states s E ( " )• 
If we abuse our notation L^ (resp. 1/^'°'") to denote the capacity loss at state s relative to Nyquist-rate capacity 
without (resp. with) power control, then the following results are immediate. 

Theorem 10. Define SNRmin := f infi<i<„ \Hu\ and SNRmax := f supi<i<„ \Hii\. 
(i) (Landau-rate sampling) If a — P (k = m), then 

inf max ^ = i(H(/3) + ofi^^)+^L|' (20) 

and 

Q «e(H) n 2\ ^ ' \ n J ^J 

(ii) (Super-Landau- rate sampling) Suppose that there is a small constant (5 > such that a — (5 > 6 and 
1 — a — /3 > d. Then 

inf max ^ = i i.n (/3) - aH ( ^) + O f^-^) + AslI , (22) 

and 

inf max ^^ = 1 U W) - an ( ^) + O f i^) + A^l . (23) 

Q ,e(i:i) " 2 \ ^^^ \aj \ V^ J ^^J 



Here, Al, A^'' , AgL and AS^ are some residual terms satisfying 



<Al,Asl<,,J— , and ,J^ <Ar,Agf< ^ + ^ 



where A is a constant defined as 

-i- . f tr (HH*) maxi<i<„ lif jJ 1 

yl := mm < j — ^-j, — ^f } ■ 

(^ fcmini<j<„ \H~^\ mini<i<„ \H^^\ J 

The key observations are that in a discrete-time sparse vector channel, the minimax capacity loss per degree of 
freedom again depends only on f3 and a modulo a vanishingly small gap. Independent random sensing matrices and 
i.i.d. Gaussian sensing matrices are minimax in terms of a channel-blind sensing matrix design, in the Landau-rate 
and super-Landau-rate regimes, respectively. 

B. Connections with Sparse Recovery 

1) Restricted Isometry Property (RIP): The readers familiar with compressed sensing ||7j, ||8) may naturally 
wonder whether the optimal sampling matrices M satisfy the restricted isometry property (RIP). An RIP constant 
6k with respect to a matrix M E C™^" is defined (e.g. pSJ) as the smallest quantity such that 



(l-4)||c||<||M,c||'<(l + 4)||c|| 
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holds for any vector c and any s of size at most k (recall that Mg is a submatrix consisting of k columns of M). 
This quantity measures how close Af s's are to orthonormal systems, and the existence of a small RIP constant that 
does not scale with (n, k,m) typically enables exact sparse recovery from noiseless measurements with a sensing 
matrix M f35\ . 

Nevertheless, RIP is not necessary for approaching the minimax capacity loss. Consider the Landau-rate sampling 
regime for example. When the entries of M are independently generated under conditions of Theorem [8] one 
typically has (e.g. p6| ) 

<J,nin{Ms)=0(-j , 

which cannot be bounded away from by a constant. On the other hand, there is no guarantee that a restricted 
isometric matrix M is sufficient to achieve minimaxity. An optimal sampling matrix typically has similar spectrum 
as an i.i.d. Gaussian matrix, but a general restricted isometric matrix does not necessarily exhibit similar spectrum. 

We note, however, that many randomized schemes for generating a restricted isometric matrix are natural 
candidates for generating minimax samplers. As shown by our analysis, in order to obtain a desired sampling 
matrix M, we require Ms to yield similar spectrum over all s. Many random matrices satisfying RIP exhibit this 
property, and are hence minimax. 

2) Necessary Sampling Rate: It is well known that there exists a spectrum-blind sampling matrix with 2fc noiseless 
measurements that admits perfect recovery of any fc-sparse signal. In the continuous-time counterpart, the minimum 
sampling rate for perfect recovery in the absence of noise is twice the spectrum occupancy fTJ]. Nevertheless, 
this sampling rate does not allow zero capacity loss in our setting. Since the channel output is contaminated by 
noise and thus has bandwidth W, a spectrum-blind sampler is unable to suppress spectral contents outside active 
subbands, and hence suffers from information rate loss relative to Nyquist-rate capacity. 

On the other hand, twice the spectrum occupancy does not have a threshold effect in our setting, as illustrated in 
Figure [5] This arises from the fact that the transmitter can adapt its transmitted signal to the instantaneous realization 
and sampling rate. For instance, the spectral support of the transmitted signal may also shrink as the sampling rate 
decreases, thus avoiding an infliction point on the capacity curves. 

VI. Conclusions 

We have investigated optimal universal sampling design from a capacity perspective. In order to evaluate the 
loss due to universal sub-Nyquist sampling design, we introduced the notion of sampled capacity loss relative to 
Nyquist-rate capacity, and characterize overall robustness of the sampling design through the minimax capacity loss 
metric. Specifically, we have determined the fundamental minimax limit on the sampled capacity loss achievable 
by a class of channel-blind periodic sampling system. This minimax limit turns out to be a constant that only 
depends on the band sparsity ratio and undersampling factor, modulo a residual term that vanishes in the SNR 
and the number of subbands. Our results demonstrate that with exponentially high probability, random sampling 
is minimax in terms of a universal sampler design. This highlights the power of random sampling methods in the 
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channel-blind design. In addition, our results extend to discrete-time counterparts without difficulty. We demonstrate 
that independent random sensing matrices are minimax in discrete-time sparse vector channels. 

It remains to characterize the fundamental minimax capacity loss when sampling is performed below the Landau 
rate, and to be seen whether random sampling is still optimal in the sub-Landau-rate regime. It would also be 
interesting to extend this framework to situations beyond compound multiband channels, and our notion of sampled 
capacity loss will be useful in evaluating the robustness for these scenarios. Our framework and results may also 
be appropriate for other channels with state where sparsity exists in other transform domains. In addition, when 
it comes to multiple access channels or random access channels p3\ , it is not clear how to find a channel-blind 
sampler that is robust for the entire capacity region. 



Appendix A 
Proof of Theorem[3] 

We would like to bound the gap between C*' and Cl^. In fact, the equation that determines the water level 
implies that 



^^'o tl\' {Hs{f)f 




iHs{f))i 



d/ 



(24) 



which in turns yields 



ly < 



P 

f3W 



rW/n 



pw 



With the above bound on the water level, the capacity can be bounded above as 
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Z^J = 1 (HM))'-, 
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(«„(/))^j ■' 

where A := maxsA ^^ ww ■ ^^^ ^^^ easily verify that A> 1. Therefore, 

^W/n ^ P , \ /-^Z" / P 



< 



y„ logdet(^A7+— /f^(/)jd/-y^ logdet(^/+— /f^(/)jd/ 

/ Z^i°gTT^WFf^7n^^- / Z^^^^T-— p |g(/.s)| d/ 



<;9W^l0g( 1 "^ ^ 



<w 



1 + SNR,„i„ 



1 + SNRniin 

We also observe that 



"^/" . («.(/))?... r'v^ (^-(z))-. 



A = max — ^^ < 

pw - pw 

. , "l^^^e I^Vo SM) ■' '^"Po</<H',sG M 5,(/) 

< mm / ^ ' 



Combining the above bounds and Theorem |2] completes the proof. 

Appendix B 
Proof of TheoremIt] 
Before proving the results, we first state two facts. Consider any m x m matrix A, and list the eigenvalues of 
A as Xi,- ■ ■ , Xra- Define the characteristic polynomial of A as 

PA{t) = det {tl -A) = f" - ^i(Ai, • • • , A„Oi"^i + ■ ■ • + i-irS,n{Xi, ■■■ , A,„), 

where 5;(Ai, • • • , A„i) is the Zth elementary symmetric function of Ai, • • • , A„i defined as follows: 

5,(Ai,...,A„):= Y. n^^.- 

l<~ii<---<ii<m j — l 

We also define Ei (A) as the sum of determinants of all l-hy-l principal minors of A. According to ||38i Theorem 
1.2.12], S';(Ai, • • • , Am) — Ei{A) holds for all 1 < / < m. After a little manipulation we obtain 

det {tl + A) = t" + Si(A)t™-i + --- + Ek{A). (25) 

Another fact we would like to stress is the entropy formula of binomial coefficients. Specifically, for any < 



k < n, one has 1 39 Page 43] 



n + 1 \k 
and hence 



< " < e"«(« 



'^m-'^^^s- :^''<«' 



where H{x) :— a; log ^ + {I — x) log j^ denotes the entropy function. 
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Now we are in a position to derive the proof for our main results. 



(1) Consider an to x n matrix B with orthonormal rows (to > k), i.e. BB — J^. Using this identity (25i, 
we can derive 

J2 det (6J,„ + b,b:) = J2 |e'"+E^""'^'(^-^^)l 

«e(H) .6(H) I '=1 J 

k 



''/ ; — 1 _/r77i\ 



^"(fc)+2.^™" 2. EiiBsBl), (27) 

.e(H) 



where the last equality follows by observing that any /th order (I > k) minor of BsB*^ is rank deficient, and hence 
El {B^BD = 0. 

Consider an index set r e {i) with / < k, and denote by (BgB*^)^ the submatrix of BsB*^ with rows and 
columns coming from the index set r. One can then verify that 

det((B,B:)J = dot {Br.sBl,) = ^ det (5.,^^;,.) , 

where the last equality arises from the Cauchy-Binet formula (e.g. f36l). Some algebraic manipulation yields that 
for any I < k: 

J2 EiiBgBl)^ Y. E det((B,B:)J= ^ ^ ^ det (B.,,B;,) 

= E E E d«t (b...b:,.) 
- E E (rDdet(i^..i^;.) 

'^i:(::;)^(::!)(T). 

where (a) follows from the fact that the number of /c-combinations (out of [n]) containing f (an /-combination) is 
(^'2;)> and (b) follows from the Cauchy-Binet formula and the fact that BB* = Im, i-C- 



J2 det {Br,fBlf.) = det (b^,[„]B;_[„] j ^ det (/;) ^ 1. 



Substituting (|28]l into (|27]i yields 

5^ det(6J„ + B,B:)=e™('^]+^e™-' ^ i?, (B.S:) 

^S(::D(T)-'- 

By observing that Bg is a tall m x k matrix, one has 

det {elk + BIB,) = e"-"' det {eh + BsB*,) 



1=0 



n — l\ fm 



k-l \l 



=> Y, dct{€lk + BlBs)^Yl 
The above expression allows us to derive a crude bound as 



("] min det (e/ + B:i?,) < ^ det {el + BlB,) =J2 



^k-l 



n — l\ fm 



k-l \l 



k^i 



1=0 



<!:(,.!, 7)''-'=i:i-" ^ ■- 



1=0 



I I \m — k + I 



E 



\m-k+U I 



• ^ 1=0 ^ ^ \k) 

Since the term („^ i ;)/(™) can be bounded above as 

„i-k+l) _ (rn-k+iy.{k-l)\ _ k\ 



(t) 



< 



fe!(m-fe)! 



(^-0!^^^;^ m-V 



(m — k) 



we obtain 



mm det{el + BIB,) < V dot (e/ + S:^,) < ( ^ ) V ^ " ^ ^""'+''' 
kj ..g(H) 



^e(>;:0 



k^-^\i 

■=o 



+i/ gi 



^'DsOC^'' 



;=o 



<'-'E(\r)(v^) 



n + k 
21 



21 



<l"^^v7" + ')(^/i)^ 



E 

4=0 






where the last equality follows from the binomial theorem. 

(2) Since Bi has orthonormal rows, applying the inequality of arithmetic and geometric means yields 



24 



(29) 



^ [l[det{eI, + iB,):{B,)^)\ < ^ 
se(H) V.=i / ,,(H) 



J:t,det{elk + {B,):{B, 



^-1 [«e(H) 

')(l + yi)"+\ 
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where the second inequaHty follows from (29 1. Since M^ has orthonormal rows, applying (29 1 yields 






<Q(l + x/^)""^ (30) 

Therefore, 

^j^, ^ E log det (6/, + {B,): (B,) J = min ^ log ( JJ det (eJ^ + (B,): (B,) J ) 



1 , /r?T,\ 1 , /n\ n + fc , . ^^ 



When (n, fc, m) are all large numbers, the entropy approximation (26i allows us to approximate the above bound 



1 ■^-^, , / /^N*.^N\ m fk\ f k\ log (n + 1) 



min ^ ^logdet {el, + (B.): (B,)J < -H ^ - H ^ + ^^^^^^^^ + 2^ 

^an(^)^nm + 2Ve+'^^^^^^±^. (31) 

\aj n 

Appendix C 
Proof of Theorem[8] 

We focus on real-valued random matrices in this theorem. The quantity of interest can be further lower bounded 

by 

log det (elk + {mmA'^ M^M^j = log dot Umm'^ + M sMI\ - log det (mmA 



^a^i„ (mM^) Ik + \MsMI\ - log det (^MM 

rp rp / 'T'\ 

which can hopefully separate MsM^ and MM if CTmin [MM 1 is a constant bounded away from zero. The 
concentration of the least singular value of a rectangular random matrix with independent sub-Gaussian entries has 
been largely studied in the random matrix literature (e.g. p8) , pO)), which we cite as follows. 

Lemma 1. Suppose that tti < (1 — 5)n for some constant 5 > 0. Let M be an m y. n real-valued random matrix 
whose entries are jointly independent sub-Gaussian random variables with zero mean and unit variance. Then there 
exist constants C, c > such that 

an-,in{MM*) > Cm 
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with probability at least 1 — exp {~cn). In particular, if the entries of M are i.i.d. Gaussian random variables with 
zero mean and unit variance, then 



a,„i„(MM*) > 



1 



with probability at least 1 — 4 exp (—en). 

Setting m ^ k, we can derive that with probability exceeding 1 — 4 exp {—en), 
logdet (elk + (mM^Y^ M^MJj > logdet feCIt + ^M 
holds for general independent sub-Gaussian matrices, and 



,Mi 



logdet MM' 
k 



logdet ( eJfc +[MM^^ ' M^M^) > logdet le ^^ ^' I 



k + \msMI 



logdet ( tMM' 



(32) 



(33) 



holds for i.i.d. Gaussian matrices. 

The next step is to quantify the term logdet [^1+ ^MsM^ J for some small e > 0. The following lemma 
characterizes the concentration of measure with respect to this term. 

Lemma 2. Suppose that - € (0, 1] is a fixed constant. Consider a real-valued random matrix A = [Cij]i<i<k.i<j<p 
where Qj are jointly independent with zero mean and unit variance, and satisfy one of the following conditions: 

(a) Qj is almost surely bounded by a constant D; 

(b) Qj satisfies the logarithmic Sobolev inequality with uniformly bounded constant cis- 



Then, for any e > and S > rrit^ v there exists a constant c > such that 



and 



- logdet ( el + -AA' I - E ( - logdet ( e/ + -AA' 



-logdet' i-AA'^] -E( -logdet' I^AA'^ 
P \P J \P \P 



> 6] <4exp(-&2<52fc3), 



;;,2c2 3\ 



> 6 ] < 4exp(-ce (5V 



(34) 



(35) 



Proof: Observe that the Lipschitz constant of log(e + a;) is upper bounded by 1/e when x > 0. If Cij is almost 



surely bounded by D and hence each entry of -TfA is bounded by ^, then applying |29 Corollary 1.8(a)] leads 



Vk 



Vk' 



to 



I logdet (el + IaaA - E ( | logdet (el + \aA^ 
k \ k \k \ k 



>^^ 



< 4 exp 



2D./^ 



k{k+pY 



"^D^ \ eVk{k+p), 
Setting (5 to be a positive constant independent of k, we have for sufficiently large k that 



I logdet (el + tAA'^] - E (^ logdet (el + \aA^ 
k \ k I \k \ k 



> 



k + p , 



(5 < 4 exp - 



4D2 



t 



If (ij satisfies the logarithmic Sobolev inequality with uniformly bounded constant cls, then the logarithmic 
Sobolev constant is bounded above by ^cls, and hence |29 Corollary 1.8(b)] leads to 

e^S^k{k+pf\ 



-logdet (el+ -AA^] -Ef -logdet ( el + -AA^ 
k \ k J \ k \ k 



>^5) <2exp|-: 



2cLs 
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The proof is complete by observing that k/p is a given constant. 

Given that the Lipschitz constant of the function log'^ x is also 1/e, the concentration result for - log det'^ ( - A A J 
follows with the same machinery. ■ 

Now that we have established the concentration results for ^ log det iel+j, A A j , it remains to determine 
E ( i log det ( el + ].-A.A J J . This is established in the following lemma. 

Lemma 3. Let matrix A = (Cij)i<i ,</£ be a real-valued random matrix such that all entries are jointly independent 
with zero mean and unit variance. For any small constant e > 0, we have 



E 



n / 1 -rM 

-log det eI+-AA^ 
k \ k J 


< ^logE 


det (el + -AaA 



-1 + 



logfc 



2^/^. 



Additionally, under Condition (a) or (b) of Lemma [2] A satisfies 



E ( -log det (eI+-AA'^ 



> -1-0 



lege 



(36) 



(37) 



Proof: See Appendix IE] 



Let us fix (5 = ^^f^- Combining Lemma 3 with Lemma 2 yields that under the assumptions of Lemma 2 one 
has for any small constant e > 0, for sufficiently large k we have 

log/c 



O 



lege 
k 



< 



and hence 



-logdet ( eJ+ tAA^ ) e 



2 logfc ^^^flogk 



2yfe 



(38) 



fc ' V '^ 

holds with probability exceeding 1 — 4cxp (—ce^fc logfc) for some constant c > 0. 

By our assumption, Mg satisfies the conditions of Lemma 2 For any alphabet S C ('^''), it contains at most 
(^') w g"W(/9) different states. Hence, applying the union bound gives 



Vs ; - logdet (el + yM^M'^ ] e 
n \ k 



-^-^,-^ + 0(^)+2^^ 



(39) 



with probability exceeding 1 — 4 exp {nH (/3)) exp (^—ce^k log fc) > 1 — 4 exp (— ce^fc log fc) . Here, c is some positive 
constant. 

The last step is to quantify ^ logdet ( ^MM 1. This is evaluated in the following lemma. 

Lemma 4. Suppose that a :— — is a fixed constant independent from {m,n) and that there exists a constant 
6g > such that a < 1 — Sg. Let matrix A ~ iCij) i<i<rn i< <n ^^ '^ real-valued random matrix such that all 
entries are jointly independent with zero mean and unit variance. 

(1) Under Condition (a) or (b) of Lemma \2\ for any e < - (1 — \/a) , there exists a constant Cj > such that 

.T^^., „.,„„ 1 _ 31ogm^ ^^^^ 



-E log det' ( -AA^] < (l-a)log a 

n \n I 1 — a 



for sufficiently large n, and 



-logdetf -AA^ I < -logdet' (-AA^] < (1 - a) log -^ 
n \ n / n \n I 1 — a 



41ogm 
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holds with probability exceeding 1 — 4exp (— cye^nlog to). 
(2) If A is drawn from i.i.d. Gaussian ensemble, then 

- \ogdet (-AaA > (1 - a) log -^ -a + (^^ 
n \n J 1 — a \ \Jn 

with probability exceeding 1 — exp (cgnlogn) for some constant cg > 0. 

Proof: See Appendix IF] ■ 

Since M satisfies the assumptions of Lemma HI with fi — a, simple manipulation gives 

1 /I t\1 /I rr\ k n 

- logdet -MM^ = - logdet MM^ ] + -'^og- 
n \k J n \n J n k 

< (1-/3) log ^-/? + /31og^ + ^^°^ (41) 

1 — p p n 

with probability exceeding 1 — 5 exp (— cnlog^ /c). 

The above results (32 1, (33 1, ( [39| and ( |4T] l taken collectively yield the following, under the condition of Theorem 
ID one has 



Vs : - logdet (elk + {mM^'^ ^ MsmA > -H (/3) - ^^^ (42) 

with probability exceeding 1 — exp (—en) for some c > 0. Combining this lower bound with the upper bound 
developed in Theorem |7] (with a = f3) completes our proof. 

Appendix D 
Proof of Theorem[9] 

Our goal is to evaluate ^ log det ( elk + M^ ( MM j Ms ) for some small e > 0. We first define two 
Wishart matrices S\s := y^MM^ - y^M sM^ and S^ := j^MsM^. Apparently, S^ -- Wm {k, ^Im) and 
2\s ^ Wm {n — k, -^Im)- When 1 — a > /3, i.e. n — k > m, the Wishart matrix S^^ is invertible with probability 
1. 

One difficulty in evaluating det [ elk + MJ [ MM^ j Ms ) is that Ms and MM'^ are not independent. 
This motivates us to decouple them first as follows 

det (elk + M^ (mm'^) ^ M J = e'^'"™ det | e/,„ + f -MM^ ) -MsM^ 



ik-t-ivis [iviivi j ivis j —e aei | eim+ \ iviivi i — ivj^jw^ 



= e'^'^^det ( e-MM'^ + -MsM^ det | -MM^ 



n n J \n 

e^-"^ det {e'~\s + (1 + e) S^) det MM^ 

\n 

e'=-" det (eJ„ + (1 + e) S.S-/) det (S\,) det (^MM^ 
det ( e/fe + (1 + e) -M^S^^^M J det (S^^) dot f-MM^ 
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or, equivalently, 

- log det (elk + Ml (mM^^ ^ M, 

= - log det (elk + (1 + e) -MJ Sr^M.) + - log det (S\,) - - log det (-MM'^\ . (43) 



The point of developing this identity (43 i is to decouple the left-hand side of J43|l through 3 matrices M ^ S, "^M 



A«- 



^\s and MM . In particular, since M s and ^^^ are jointly independent, we can examine the concentration of 
measure for Mg and S^^ separately when evaluating Af^Sr^Af^. 



The second and third terms of (43 1 can be evaluated through Lemma |4] which indicates that 



^ log det (-MM^] < - (1 - a) log (1 - a) - a + ^l°i!" (44) 



and for all se (I'^'l): 

1, , /— \ n — k 1 /n„\ 1 fn~k 

-logdet (^\s) == -log det -^\s + -log det 1 

n n n — k \n — k J n \ '^ 

> (1-/^)1- ('l-T^Vogfl-T^VT^U«log(l-/3) + 0^^°^'" 



l-PJ ^\ l-pj 1-/3 J ' ' \v}/'^ 

>-(l-a-/3)log(^l-^^ -a + alog(l-/3) + o(^i^y (45) 

hold simultaneously with probability exceeding 1 — exp (— cgn). 

Our main task is then to quantify log det I elk + M ^ 37^ Ms ) , which we derive through the following lemma. 

Lemma 5. Suppose that m > k. Let matrix A — (Cii)i<i<m i< xfe '^^^ ^ ~ (^u)i<i <m ^^ ^^^ independent 
matrix ensembles such that Qj ^ f\f (0, 1) are jointly independent, and B ^ >V„j (n — fc, Im)- Then 



- log det (elk + A^B^^A 
n \ 



> - (a - /3) log (a - /3) + aloga + (1 - a - /3) log (l - -^\ - /31og (1 - a) + O f ^ 

\ 1 — a J \ ^Jn 

holds with probability exceeding 1 — 2 exp (— Cgn log n) for some constant Cg. 

Proof: See Appendix [G| ■ 

Lemma 5 develops a lower bound for ^ log det ielk + MJs, ^M^ j . This together with (44 1, (45 1 and (43 1 
yields that 

- log det ( eJfc + Ml (mm'^\ ^ M, 
> - (a - /3) log (a - /3) + a logo + (1 - a - /3) log M - ^^ ) ^ /?log (1 - ") 

- (1 - a - ^) log h - ^— — j - a + alog (1 - /3) + (1 - a) log (1 - a) + a + O ' ^ 

with probability exceeding 1 — 4 exp (— cgnlogTi). 



Since there are at most ('^') 



proof. 
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^nHiP) (jifferent states s, applying a union bound over all states completes the 



Appendix E 
Proof of Lemma[3] 



(1) The identity (25 1 gives 

k 

E [det Ukl + AA^\ ] = {ekf + ^ {ekf^' E (e, (aA^ 



where Ei I AA J denotes the sum of determinants of all i-by-i principal minors of AA . There is a well known fact 
that for any G = (Cij)i<i </ with independent entries having mean zero and variance one, one has E det ( GG ) = 
n. In fact, if we denote by Yli the permutation group of I elements, then the Leibniz formula for the determinant 
gives 



det(G)= Y^ sgn{a)Y[Q,a(i)- 



Since Qj are jointly independent, we have 



Edet(GG^j =E(dct(G))'= ^ £[] 10,^^1= E II^K^-Wr^"- (^6) 

Denote by ( AA ) the principal minor of AA coming from the index set s, then Ei ( AA I can be computed 



E 



H 



AA' 



- Y, E(det((AA^) )) = E E(det(A«.[„]Aj_| 

= E E E(det(A.,<,; 

^e(W)^g(W) 

(fc) /fc\ /fc^ 



z!, 



where (a) follows from the Cauchy-Binet formula and (b) follows from (46 1. Combining the above results with 
simple algebraic manipulation yields 



E 



(det (efc/ + AA^)) - {ekf + ^ {ekf-' 



k\ 



klY 



i J {k — i)! ^-^ \k ~ ij {k — i)\ 



k \ {ek) 



k—i 



kiY 

1=0 



k\ e'k' 



(47) 



In order to bound this sum, one way is to first identify the largest term. Using the short hand notation /e(j) 



(^^)^andr,(j + l):=M 



_ /.(j+i) 
(i) 



re{j) 



we can obtain 



/e(j + l) _ {k-j-iy.u+iy. 0+1)! _ {k-j)ke 



W) 



^' kJeJjl 



(fc-j)y! 



(j + ir 
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Apparently, r^Q) is a decreasing function of j. By setting r^{j*) = 1, we can derive 



(j* + ir 



= 1 => .f = 



ke - 2 + J Ak^e - 4: + {ke + 2) 



which can be bounded by 



-ke-2 + VWe , -fee - 2 + V4.k^e + J {ke + 2f 
k{^/e-e) < <j* < "^ = ky/e 



2 ■' 2 

for large fc and small e. Suppose that j* ~ ek for some e e [y/e — e, ^/e], then by Stirling's formula 



^log/(j)<^log/(j*)-^log 



fc \ fcJ e^ 



..r/ J 



*l 



-eloge — (1 — e) log(l — ?) + elogfc + eloge — elog {ek) +€ + O 



<2Ve + 0('^)=2Ve + 0^'''^' 



logfc 



V k 



where the last equality follows from L' Hospital's rule. This together with (47i yields 



-logEfdetf 



eki + AA^ 



<l\og[kl{kf{f))] 

^ log(fc!)+logfc ^2^ + ^^^^^ 



= log fc - 1 + 2V^ + O 



logfc 



Together with Jensen's inequality, one has 



1, 



E ( logdet ( el + -AA^ 



log k 



-E flog det ( 



ekI + AA' 

< -logk+ -logEdet ( el + -AA^ 
k \ k 



-l + 2Vi + 



log/c 



(2) The lower bound follows from concentration inequality. Define Y :— logdet ( eJ + ];74A J — 
E Mogdet (el + \AA^\\ Lemma 2 implies that P {\Y\ > 5) < 4exp (-ce^(5^fc). Denote by /y(-) the probability 
density function of Y, we can obtain 

E (e^) < E (el^l) = ^ e^ fy {y) dy = -e^P {Y > y) \^ + j^ e^P {Y > y) Ay 

< 1 + / 4exp {y — ce^ky^) dy 
Jo 



< 1 + 4 




1 



ce^ \4:ce'^k^ 

Taking logarithms at both sides and plugging in the expression of Y yields 



logEdet ( e/+ -AA^ j < Elog det (el+ tAA^ 



los 



, n ( \ 

1 +4j:rTrexp ' 



ce^ \\ce^k 



(48) 
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Combining EE^ and (|46|l yields that 



lE\ogdet((I+yAA^] > llogEdet ( el + I AaA + ^ - O , , 
k \ k I k \ k I k \k 






Appendix F 
Proof of Lemma|4] 

(1) We first develop the upper bound. The Cauchy-Binet formula indicates that 

EUci(AA^y\= ^ EUci( As A^ 



S I I 1 



where s ranges over all m-combinations of {1, • • • , n}, and Ag is the m x m, minor of A whose columns are the 
columns of A at indices from s. It has been shown in (J46| that for each jointly independent mx m ensemble As, 
the determinant satisfies 

Edet( As A^) =m\, 

which immediately leads to 






Besides, using the entropy formula and the following identity |36| Equation 1.46] 

log (to!) < log (erriTO^e^™) = (m + 1) log to — m + 1, 



we can obtain 



1, „/, /1..t\\ m , (77T,+ l)l0KTO TO, 1 /TO 

-logE det -AA^] < logn+^^ — ' ^ + -+H 



n n n n \ n 

m TO, log TO, 771 2 log 777 / 777 \ 

< log 77 H 1 h "H — 

77 77 77 n \ n / 

/I M 1 , 2l0gTO 

= (1 - a) log a H . 

1 — a 77 

Define Z := log de.f (\AA^\ - E flog det' fiAA^'jV then Lemma implies that P(|Z|>t) < 
4exp (— ce^T^77). We can now derive 



(e^) < E (el^l) = -e^P {Z > z) |^o + f e'¥ {Z> z)< 

/>oo 

/ 4exp (z ~ ce nz ) dz 
Jo 



, TT ( \ 

< l + 44/:^^exp ' 



ce^ V 4ce^77 



V 456^7 
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Taking logarithms and plugging in the expression of Z yields 

logEdet*^ {-AA^\ < Elogdet' (-AA^j + log 



l + 4W:^^exp 



TT /I 

_-^ exp 



(49) 



which leads to 



-Elogdet" i-AA'^] > -logEdet' ( -AA^ 
n \n In \n 



O ('i^l > - logEdet f IaA^ V O f ^ 
n / n \n J \ n 



n M 1 f\ogn + \oge 

(1 — a) log a — U 

I — a 



(50) 



On the other hand, setting r = 1 in the bound P {\Z\ > t) < 4 exp (— ce^r^n) indicates that with probability at 
least 1 — 4 exp (— ce^n), we have 



logdet' ( -AA^ ) -Elogdet' ( -AA'^ 
n J \n 



< 1 



IgElogdet^(iAA^) < ^gj. (-AaA < e ■ eElogdet'(iAA^) 



or, equivalently. 



Jensen's inequality implies that 

gElogdet'(iAA-) <.Eglogdet^(iAA-) ^El^g^gj. ( I AaA , 



and hence one also has 

IgEiogdet^(iAA-) < ^gj. (-AaA < e ■ Elogdet' f-AA'' 
e \n J \n 

with probability exceeding 1 — 4 exp (— ce^n). 

Besides, the tail distribution for fTmin (^) satisfies pTl, Theorem 1.1] 

^a] < e (1 - V^)) < (C?)^'"")" + e-™ 
for some constants C, c > 0. For a small constant e < ^ (1 — V^)"' taking e = rr^ gives 

cr,„in [ -AA^ ) < e ) < exp (-CmTl) 



for some constant c^ > 0. This basically implies that 

det (-AA'^j = det"^ (-AA'^j j > 1 - exp {-c^n) . 
Hence, the union bound implies that 

det (^AaA = def^ (^AaA and det^ f^^^"^) > e-^e^C^s'^^'l^^^")) 
hold with probability exceeding 1 — 2 exp (—en) for some constant c. Therefore, 

1 E(logdet"(iAA^)) 



E det -AA' 1 > (1 - 2 exp (-en)) e~'e 

which immediately implies that for sufficiently large n, 

- log E det ( -AA^ I > -E (log det' ( -AA'^ 
n \n I n \ \n 



2 

n 



(51) 



34 



Equivalently, we have 



-EMogdet' i-AA^ 
n \ \ n 



1 /I T.\ 2 1 

< - logE det -AA^ + - < (1 - a log 

n \ n In \ — a 



31ogm 



(52) 



Setting 5 = log ra/n in Lemma [2] then leads to 



-log det' [-AA^] < -E (log det' i-AA^ 
n \n I n \ \n 



log TO 



> > 1 — 4 exp (— ce^n log^ m) 



Using the upper bound (52i and the fact that ^ log det ( -^AA j < ^ log det' I -^AA 1 immediately gives 

41ogm 



- log det ( -AA'^ I < (1 - a) log 

n \n I 1 — a 



> 1 — 4cxp (— ce'^nlog 



(2) In order to derive a lower bound on log det ( ^AA ) , it is helpful to first estimate the number of eigenvalues 
of -^AA that are smaller than e, i.e. ^"^j^ l[o ^j ( Ai ( ;j^A74 J J. Since the indicator function l[o.e] (•) is 
discontinuous, we define instead a continuous function ge{x) not smaller than 1[q,:^{x) such that 



9eix) = < 



1, if < x < e; 

~x/e + 1, if e < a; < 2e; 
0, else. 



Clearly, g^{-) has a Lipschitz constant 1/e and satisfies l[o,e](a;) < ge{x)- For any small constant < e < ^, we 
have the following crude bound 



El 



[0,. 






i=l 



(53) 



1 , 2£log5!- 



Since xlog - is an increasing function for a: < 2e < e , we have log - < ^^-^ and hence 



E i°g77i 



i:A.(iAA^)<e 



A,, ^AA' 



< 



E 



-2elog(2e) 



i:\,[^AA'^)<e ^i 



{iAA-) 



<2elog— tr 



-AA' 



This together with ( 53 i yields 



1 



1 



H;EM-^.^^^M h ^«'°ei > ^-^-^ 



1 \ 1, 



1 



2melogJj , a „ , - 

< 3elog-- 

n — m ~ 1 1 — a 2e 



where the equality above follows from the property of Wishart matrices (e.g. |42 Theorem 2.2.8]). 



Note that ge{x) has Lipschitz constant 1/e and standard Gaussian measure has logarithmic Sobolev constant 
cls = 1- Applying (29] Corollary 1.8(b)] yields that for any (5 > 

a 1 A 

3elog->5 

1 — a 2e / 






>6 



< 2 exp (-26^(5 



2x2_3\ 



35 



Since g{x) is an upper bound on 1[q,^]{x), this implies that 



c^d{^\K{^AA^)<e} 3^1 

*^ ^ ^ ^ ^ <= Ino- 1- S 



n 



1 — a 2e 



(54) 



with probability exceeding 1 — 2exp {—2e^5^'m? 

Now that we have an estimate of the number of small eigenvalues, our next task is to estimate the influence by 
the set of small eigenvalues. It has been shown in |43 Theorem 4.5] that 

/ 



a- 






A . UaA^ 



>n'^ \ < 



\i 



\ I A A^ 



^inin 1 -A-A 



T 



> n\ < 



1 /6.414 



27r V n 



n—m-\-l 



< exp {—c^n log n) . 



(55) 



for some constant C5 > 0. 

Now that we have a tail upper bound for the condition number, we can lower bound A,nin ( ^AA 1 by developing 
a lower bound on A^ax ( n^^ )■ ^^ setting e = n^^^"* and 6 — n^^l^ in (54i and hence Y3^eloge + (5 <C 1/2, 
we can derive a crude lower bound such that 

1 \ _/ card{z|A,(lAA^)<^} ^ 



A. 



1 . .r 



AA^ < 



,1/4 



< 



< 2 exp i^—c^'mf' 



This together with ( [55] l yields 



A • ( -AA^\ < -^ 



<P A 



-AA^ ] < 



1 



<P A 



-AA^ ] < 



ni/4 
1 



A„iin I -AA^ j < -^^ and A„ 



'-aaA> ' 



1/4 



n 



,1/4 



i^A^ 



^iTiin 1 ., J\.J\. 



>n^ 



< exp (— csn log n) 
for some constant £5 > 0. 



Set e = n ^/^, 5 — n ^1'^ logn from now on, which indicates that 

cardji : A, (^AA^) < e} /logn 



with probability at least 1 — exp {—c^n log^ n) for some constant cg > 0. Note that 

< - logdet' \-AA^\ - - logdet {-AA^\ = - V log — j-^ 

n \n In \n In ^-^ \ / 1 4 

^ ' ^ ^ j:A.(iAAr)<e '^^ (^^^A 

cardjz: A, (iAA^) < e} 



log 



(56) 



(57) 
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This together with ( 57 1 and ( [56| implies that 

1 



logdet' ( -AA^] - -logdet ( -AA'^ 
n \n In \n 



O 



log n 



(58) 



with probability exceeding 1 — exp (—cyn log n) for some constant 67 > 

Besides, Equation (50 1 gives a lower bound on E I ^ log det^ I ^AA j j . Setting e = n~^''^ and S = n~^'^ log ) 
in Lemma |2] leads to 



- logdet' -AA' > (l-a) log a + 

n \n I 1 — a 



log n + log e 



> 1 - 4 cxp {-en log^ n) . (59) 



Combining ( 58 1 and ( 59 1 with a union bound yields 

1 . . /I 



logdet -AA' > (1 - a) log 

n \n I I — a 



a + 



log n 
771/2 



with probability exceeding 1 — exp (cgn log n) for some constant cg > 0. 



Appendix G 
Proof of Lemma[5] 

Suppose that the singular value decomposition of the real-valued A is given hy A = U a 
S^ is a diagonal matrix containing all k singular values of A. One can then write 







V A^ where 



logdet Ulk + A^B^^a\ = logdet elk + Va 



Sa 



U\B^Ua 







V 



logdet ( e/fe + Ea ( -B ^) T.a\ > logdet l^^\ j - logdet <j - 



i in 



(60) 



where B = U aBU a ^ VVm [n ~ k^U aU a] ~ VVm (77 — k,Im) from the property of Wishart distribution. 

Here, ( B ) denotes the leading fc x fc minor consisting of matrix elements of B in rows and columns from 

V / [k] 

1 to k, which is independent of A from the Gaussianality property. 

Note that Mogdet(^I]^) — ^ logdet (^A A\. Then Lemma 4 implies that with probability exceeding 
1 — exp (— cgnlogn), one has 



logdet 1 — ^A 



1 /It 

-logdet -A^ A 

77 \ 77, 



logdet —A^ A 

n m \ 777 



1 , , /777 ^ 

-logdet —Ik 

77 V 77 



>a(-(l-^jlogM-H -H +/3loga + 
= -(a-/?)logf^-/3 + /31oga + 0^^°S^" 



log^i 



(a - /?) log (a - /3) - /3 + a log a + O 



log^ 77 



(61) 



On the other hand, it is well known (e.g. |42 Theorem 2.3.3]) that for a Wishart matrix B ^ Wm [n — k, Im), 
B I also follows the Wishart distribution, that is, ( B J ^ Wk [n — m.,Ik)- Applying Lemma 4 again 



[fe] 
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yields that 



-logdet ( - (b ^) 

n \n \ J [k] 

n — m 1 



n n — m 



( 1 /^-i\~^\ 1 fn — m \ 

log det LB + - log det Ik (62) 

\n — m \ ' \k\ ) n \ ^ } 

= -(l-a-/3)log('l-^')-/3 + ^log(l-a) + o('i^') (63) 

holds with probability exceeding 1 — exp (— cgnlogn). 
Combining ( [SO] ), ( [6T] i and ( [60| leads to 

- logdet fe/fc + A^B^^A^ > - (a - ^) log (a - /3) + aloga + (1 - a - ;3) log (l- :j- 

-^log(l-c.) + ofi^"~ 

with probability exceeding 1 — 2 exp (— csnlogn). 
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